"They Are Out There, If You Know Where to Look": Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval

نویسندگان

  • Raghavendra Udupa
  • K. Saravanan
  • Anton Bakalov
  • Abhijit Bhole
چکیده

It is well known that the use of a good Machine Transliteration system improves the retrieval performance of Cross-Language Information Retrieval (CLIR) systems when the query and document languages have different orthography and phonetic alphabets. However, the effectiveness of a Machine Transliteration system in CLIR is limited by its ability to produce relevant transliterations, i.e. those transliterations which are actually present in the relevant documents. In this work, we propose a new approach to the problem of finding transliterations for out-of-vocabulary query terms. Instead of “generating” the transliterations using a Machine Transliteration system, we “mine” them, using a transliteration similarity model, from the top CLIR results for the query. We treat the query and each of the top results as “comparable” documents and search for transliterations in these comparable document pairs. We demonstrate the effectiveness of our approach using queries in two languages from two different linguistic families to retrieve English documents from two standard CLEF collections. We also compare our results with those of a state-of-the-art Machine Transliteration system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Tamil-English Cross-Language Information Retrieval by Transliteration Generation and Mining

While state of the art Cross-Language Information Retrieval (CLIR) systems are reasonably accurate and largely robust, they typically make mistakes in handling proper or common nouns. Such terms suffer from compounding of errors during the query translation phase, and during the document retrieval phase. In this paper, we propose two techniques, specifically, transliteration generation and mini...

متن کامل

Improving Cross-Language Information Retrieval by Transliteration Mining and Generation

The retrieval performance of Cross-Language Retrieval (CLIR) systems is a function of the coverage of the translation lexicon used by them. Unfortunately, most translation lexicons do not provide a good coverage of proper nouns and common nouns which are often the most information-bearing terms in a query. As a consequence, many queries cannot be translated without a substantial loss of informa...

متن کامل

Learning to Find Transliteration on the Web

This prototype demonstrate a novel method for learning to find transliterations of proper nouns on the Web based on query expansion aimed at maximizing the probability of retrieving transliterations from existing search engines. Since the method we used involves learning the morphological relationships between names and their transliterations, we refer to this IR-based approach as morphological...

متن کامل

Extracting English-Korean Transliteration Equivalence from Domain-Specific Dictionaries

Automatic translation knowledge acquisition or automatic bilingual dictionary construction has become an important first step for natural language applications such as machine translation and cross-language information retrieval. Transliterations are used to translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from E...

متن کامل

Improved Cross-language Information Retrieval via Disambiguation and Vocabulary Discovery

Cross-lingual information retrieval (CLIR) allows people to find documents irrespective of the language used in the query or document. This thesis is concerned with the development of techniques to improve the effectiveness of Chinese–English CLIR. In Chinese–English CLIR, the accuracy of dictionary-based query translation is limited by two major factors: translation ambiguity and the presence ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009